Jaime Carbonell ( Chair ) Tom Mitchell
نویسندگان
چکیده
The development of modern information technology has enabled collecting data of unprecedented size and complexity. Examples include web text data, microarray & proteomics, and data from scientific domains (e.g., meteorology). To learn from these high dimensional and complex data, traditional machine learning techniques often suffer from the curse of dimensionality and unaffordable computational cost. However, learning from large-scale high-dimensional data provides big payoffs in text mining, gene analysis, and numerous other consequential tasks. Recently developed sparse learning techniques provide us a suite of tools for understanding and exploring high dimensional data from many areas in science and engineering. By exploring sparsity, we can always learn a parsimonious and compact model which is more interpretable and computationally tractable at application time. When it is known that the underlying model is indeed sparse, sparse learning methods can provide us a more consistent model and much improved prediction performance. However, the existing methods are still insufficient for modeling complex or dynamic structures of the data, such as those evidenced in pathways of genomic data, gene regulatory network, and synonyms in text data. This thesis develops structured sparse learning methods along with scalable optimization algorithms to explore and predict high dimensional data with complex structures. In particular, we address three aspects of structured sparse learning: 1. Efficient and scalable optimization methods with fast convergence guarantees for a wide spectrum of high-dimensional learning tasks, including single or multi-task structured regression, canonical correlation analysis as well as online sparse learning. 2. Learning dynamic structures of different types of undirected graphical models, e.g., conditional Gaussian or conditional forest graphical models. 3. Demonstrating the usefulness of the proposed methods in various applications, e.g., computational genomics and spatial-temporal climatological data. In addition, we also design specialized sparse learning methods for text mining applications, including ranking and latent semantic analysis. In the last part of the thesis, we also present the future direction of the high-dimensional structured sparse learning from both computational and statistical aspects.
منابع مشابه
From Data to Knowledge to Action: Enabling Advanced Intelligence and Decision-Making for America’s Security
Large-scale machine learning can fundamentally transform the ability of intelligence analysts to efficiently extract important insights relevant to our nation’s security from the vast amounts of intelligence data being generated and collected worldwide. Intelligence organizations can tap into rapid data analytics innovations that Internet industries and university research organizations are mak...
متن کاملAI Magazine Cumulative Index -- Volumes 1-4
D Davis, Randall. Expert Systems: Where are we? And where do we go from here? Expert Systems: Where are we? And where do we go from here?
متن کاملCMU Report on TDT-2: Segmentation, Detection and Tracking
This paper reports the results achieved by Carnegie Mellon University on the Topic Detection and Tracking Project’s secondyear evaluation for the segmentation, detection, and tracking tasks. Additional post-evaluation improvements are also
متن کاملCMU Approach to TDT-2: Segmentation, Detection, and Tracking
This paper reports the results achieved by Carnegie Mellon University on the Topic Detection and Tracking Project’s secondyear evaluation for the segmentation, detection, and tracking tasks. Additional post-evaluation improvements are also
متن کاملLearning from Solution Paths: An Approach to the Credit Assignment Problem
In this article we discuss a method for learning useful conditions on the application of operators during heuristic search Since learning is not attempted until a complete solution path has been found for a problem, credit for correct moves and blame for incorrect moves is easily assigned We review four learning systems that have incorporated similar techniques to learn in the domains of algebr...
متن کامل